Reinforcement Learning Based on a Bayesian Confidence Propagating Neural Network
نویسندگان
چکیده
We present a system capable of reinforcement learning (RL) based on the Bayesian confidence propagating neural network (BCPNN). The system is called BCPNNRL and its architecture is somewhat motivated by parallels to biology. We analyze the systems properties and we benchmark it against a simple Monte Carlo (MC) based RL algorithm, pursuit RL methods, and the Associative Reward Penalty (AR-P) algorithm. The system is used to solve the n-armed bandit problem, pattern association, and path finding in a maze.
منابع مشابه
Propagating Uncertainty in Multi-Stage Bayesian Convolutional Neural Networks with Application to Pulmonary Nodule Detection
Motivated by the problem of computer-aided detection (CAD) of pulmonary nodules, we introduce methods to propagate and fuse uncertainty information in a multi-stage Bayesian convolutional neural network (CNN) architecture. The question we seek to answer is “can we take advantage of the model uncertainty provided by one deep learning model to improve the performance of the subsequent deep learni...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملPractical Hyperparameter Optimization
Recently, the bandit-based strategy Hyperband (HB) was shown to yield good hyperparameter settings of deep neural networks faster than vanilla Bayesian optimization (BO). However, for larger budgets, HB is limited by its random search component, and BO works better. We propose to combine the benefits of both approaches to obtain a new practical state-of-the-art hyperparameter optimization metho...
متن کاملAn Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network
RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...
متن کامل